Reason Why

“At the beginning of FY2020, the team of Marketing, Comms, and Sales were challenged by a new business objective: High Velocity Merchants (HVM). After an initial market research, we realized the potential of this new target and the difficulties of identifying these companies with the tools available at the time. As the information required (company name, investment stage, founders, events, associations, etc) was mostly of opened access, we decided to use data analytics in our favor to create a project that could benefit the marketing and communications efforts in regard of lead generation, content topics, advertising strategies, branding awareness and events mapping.”

  • Estefanía Granados. Marketing Specialist.

“On top of the mountain of our ambitions, we are looking to make PayU, within the next 3 years, the number one payment company in the fast-growing markets, as well as the number one player in full financial services in specific regions, and to build our own ecosystem. Our goals are clear, and we know that we will only achieve this by focusing our efforts on innovation. What does it mean? This means making PayU a data-driven company and putting data at the core of our strategies. It means having a team that experiences a data culture and is not limited to its original area of knowledge. Through this project we are changing our mindset. We are taking Communication and Marketing out of the”support areas" position, and turning them into market intelligence entry doors. Here we are planting a seed of Communication and Marketing Intelligence that can guide leaders and teams in wiser business decisions. ‘Decoding’ who HVMs are and how they think, is just the beginning of what we can offer in terms of stronger targets, KPIs, and actions."

  • Fabiana Paiva. Communication Manager Latam.

“This mapping exercise not only allows us to better understand the customer profiles we are looking for. Also gives us a concrete guideline to establish an effective communication and resources investment strategy, in order to attract them.”

  • Ángela Bohorquez. SSC Marketing Manager.


Objectives

High Velocity Merchants (HVM) were initially conceived as companies valued over 1 billion dollars that had not executed a merge or Initial Public Offering (IPO). They usually were known for having passed through multiple funding rounds and be at a late investment stage.

As the concept was relatively new and there was ambiguity about what it means to be “at a late investment stage”, we decided to map available companies at all investment states.

This project is the effort to identify HVM’s behavioural and topic patterns in their global ecosystem of entrepreneurship at an advanced investment level through its points of interaction. Its main objectives are:

  • Identify who are the HVM, where are they and what events they attend to.
  • Identify the conversational topics they reveal on social media.

To pursue these objectives, Fidelio made a digital research based on web sites dedicated to gather companies’ information and events. Angel.co, Crunchbase, 10times and Twitter are the main data souces.

These data sources were used to consolidate a worksheet database with information about:

  • Companies
  • Founders
  • Venture Capitalists
  • Events
  • Institutions

Twitter was used as the main data source for social media. Its content served to model the latent topics on all HVM’s conversations available online.


Methodology


  • Name of the project: HVM Ecosystem Mapping.
  • Data collection date: 15 Jun 2019 - 17 Aug 2019.
  • Company responsible for the study: FIDELIO DIGITAL S A S.
  • Company sponsoring the study: PAYU.
  • Objective group:
    • Companies in multiple investment stages.
    • Founders of those companies.
    • Venture Capitalists investors of those companies.
    • Events worldwide.
    • Institutions organizers of those events.
  • Sample design: targeted unweigthed sampling.
  • Sample analysis: Cross Industry Standard Process for Data Mining (CRISP-DM)
  • Sample framework: angel.co, crunchbase.com, 10times.com, twitter.com.
  • Sample size:
    • 4,205 companies.
    • 5,540 founders.
    • 487 venture capitalists.
    • 8,080 events
    • 2,155 institutions
  • Data collection technique: web scraping.
  • Geographical scope: worldwide.
  • Error margin: does not apply.
  • Delivery report date: Aug 19, 2019.


This work has been developed thanks to open source technologies. Python: pandas, numpy, matplotlib, wordcloud, nltk, sklearn, collections, gensim, bokeh. R: knitr, readxl, data.table, plotly, RColorBrewer, DT, bubbles. JavaScript, HTML and CSS.

Glosary

  • Word cloud: An electronic image that shows words used in a particular piece of electronic text or series of texts. The words are different sizes according to how often they are used in the text. For each Word Cloud in this document, you will find the semantic roots of every word. We grouped it to preserve consistency between words. For example: “developer”, “development” and “develop” would match into the same root word: “develop”. Source
  • Topic modelling: In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. Source


Mapping Tables


Use the navigation bar on your left to step across the tables or select one of the links below:

  1. Companies. Total of 4,205 companies.
  2. Founders. Total of 5,540 founders.
  3. Venture capitalists. Total of 487 venture capitalists.
  4. Events & Institutions. Total of 8,080 events and 2,155 institutions.


Companies

4,377 companies were mapped, including 172 categorized as Closed in the “Operating Status” column. Those latter companies were added to the database yet excluded in this analysis leaving a total of 4,205.



Among the main words identified on the companies’ description we find “platform”, “provid” (meaning provider), “onlin”, “develop” and “servic”. It is implied on the words that companies are mostly related to online platforms that provide services and solutions with technology and data. i.e. mainly digital companies.

Besides columns shown at the table, we gathered 94 columns for each company. Among the most important are: Categories, Funding Status, IPO Status, Facebook, LinkedIn, Twitter, Contact Email, Phone Number, Description, Total Funding Amount, Number of Funding Rounds, Number of Investors, Number of Lead Investors Number of Current Team Members Number of Articles an Number of Events, among many other.


Verticals

In order to give better insights, all categories in this report were merged with PayU’s main verticals:

Additional categories were added in order to preserve consistency among companies and between industries.

352 (8.4%) companies did not have a category so they were taken out of this pie chart.

PayU’s main verticals represent 69% of the categories in the database. Most of them are Digital Services (28.2%), Direct Selling (17.5%) and Software (16.6%).

The category with most revenue in USD on average is Aerospace ($653M USD), Digital Services ($471M USD) and Fintech ($169M USD). Since Aerospace does not have a significant amout of companies (10), the category with most sales, on average, is Digital Services.

In regard of monthly visitors, Software as a category is significantly higher than the rest with an average monthly visit of 67 millions. The top companies are: YouTube (24 billions), Quora (589 millions) and Zhihu (286 millions).

Government and NGO’s have the most amount of tech products. However, the amount of companies working in those categories is not representative (11 and 4). The category with most tech products is Uber Model Sharing (31).


Location - Country

84.3% of the companies mapped are located in the United States (75.2%), United Kingdom (4%), China (3%) and India (2.1%).

For each country, we mapped the average annual USD revenue, monthly visitors, technological products, team members, events, articles (made by them), number of funding rounds and number of investors. Countries with relatively low amount of companies (<9) are not going to be taken into consideration for this specific analysis.

Finland ($266M USD), India ($258M USD) and United States ($245M USD) take the lead on average income per company.

In regard of average monthly visitors, Indonesia takes the lead with 48M visits. The amount of companies in Indonesia is relatively small (9) so it makes sense to zoom in to the most visited company: Bukalapak. It is an e-commerce company with 82M visits monthly.

Use of technological products inside these companies is vital to garantee a competitive advantage. The ones with the most are Indonesia and the next country, with relatively high companies (81) is India. This country is worldwide known for its computational capabilities and proof of that is the amount of technological products every company uses on average: 28.

Belgium and Israel take the lead on average team members (10 and 9, respectively). Switzerland and US on number of Board Members (6 and 4).

On average, the countries with the most visits to events are United States (6), Sweden (6), France (6) and Belgium (6). This insight can serve as an input to tune the events participation strategy worldwide towards these countries.

Finally, the countries with the most investors, on average, are Sweden (14), Singapore (11) and United States (9).

Location - PayU Regions

On average, India leads the annual revenue with $258.7M USD. Follows USA and Canada with $238.8M USD and Asia with 98.4M USD. Not surprisingly, USA/CA and India lead the average monthly visitors of all regions (18.6M and 9.7M respectively).

On average, India and Asia lead the average funding rounds (6 and 4.4), while India, Brazil and USA/CA lead on average investors per company (9).


Funding

54.8% of companies are part of PayU’s main focus as they are passing through funding stages (Early and Late Stage Venture). Merged & Acquired (M&A) companies take 36.1% of the database.

Public companies (IPO), the ones that list into the stock exchange, represent 1.74% of the data.

We decided to take into consideration for the analysis all companies since the definition of “High Velocity Merchants” is still been tested.

Companies with Private Equity are the ones that have not passed through an investment state, so they are financed by own equity and debt. They can go from small and medium business to big private companies. The average number of investors in this type of companies is 7. Even though they represent 1.26% of the amount in the database, they account for 34% (736M USD) of the Funding Amount mapped.


Team

53% of companies in the database have between 11 and 200 employees and 14% have between 201 and 5.000 employees. Very few (44) have more than 5.000 employees and most of them are in the United States.

The difference between Employees and Team Members is that the latter is the leadership team (VP’s, managers, C-level employees).

Companies between 501 and 1000 are the ones that assists the most to events (13 per company, on average), while companies between 1001 and 5000 publish the most articles, on average.


Contact Information

This barchart is how much filled the contact information in the database is; since not all data is public or is not centralized.


Founders



Founders refer themselves at their job role (like “CEO”, “CTO”, “entrepreneur” or “investor”), work approach (work, design, technolog, develop) and their academic experience (studi, stanford university, University California).


Top Influence Founder

This bubble chart represents those founders with most connections in Angel.co and most followers on Twitter. It is filtered by those who have more than 600 connections to facilitate the visualization.

  • The color of the bubble represent the number of companies a founder has, according to Angel.co.
  • The size of the bubble represents a weighted indicator that considers 30% of Twitter followers and 70% of Angel.co connections. This distribution gives more weight to Angel.co because this datasource is considered to have less noise in regard of personal connections and it is used for professional to find jobs. Besides, between 5 and 30% of Twitter followers are fake: they’re bots, spam accounts, inactive users, propaganda, or other non-engaged/non-real users according to Sparktoro.com

Of the 5,540 founders that are part of this study, Angel.co exposed that there are 444 founders owners of more than 1 merchant. David Gutelius, David Cancel, Harj Taggar are among the most influential with more than 1 company. Regardless of their company number, Paolo Privieta, Richard Titus, Micah Badwin and Danielle Morill would be the most influencial founders according to the data.


Contact Information

Founders personal contact like mail or mobile use to be hidden in their social media profile. Linkedin is the channel with more structured information and more business context use, thus the possibility to get in touch by Sales Navigator (Linked In) is very recommendable.


Venture Capitalists



Venture Capitalists describe themselves in relation to investment as venture capitalists and in relation to what kind of enterprise they are looking for (startup, early stage). They don’t show their academic or work background, neither their funds origin.


Location

USA, India, China, Rusia and UK group 60% of Venture Capitalists. Most of them are in the USA (33%).


Employees

76% of venture capitals in the database have between 1 and 10 employees. Very few (7) have more than 5.000 employees and most of them are in the United States.


Events & Institutions

Events happening in the next months:


Locations

USA, Canada, China and India are hosts of 3809 events (48% of all global events mapped).


Dates

Due to the fact that all events were gathered during July and August of 2019, most of the events are from those dates. During the next year, April is having the peak of the year with 167 events.


Verticals

The industries with more events are HR, Jobs & Career, Antiques & Philately, Veterinary, Aerospace & Telecommunication. The events with the most visitors are Gifts & Gifting, Fashion & Beauty, Architecture & Designing, IT & Technology and Business Services.


Visitors by Verticals & Regions

On all regions, Direct Selling is the category with most visitors. Travel is the second most visited category in Asia and India and Agtech & Food is the second most visited category among EMEA and SSC.


Institutions

2,155 of institutions are in USA, Canada, India and China,


Companies’ content



Most of the content generated by companies is related to development, software, technology, money, businesses, sites, engagement, and part of it related to women in tech as well.


Verticals’ social status

As the number of companies in Aerospace and Governtment is relatively low (10), we are not going to take them into consideration.

When the average number of followers on Twitter (Avg Followers) is ordered from highest to lowest, Software and Advertising are on the top as well as they are on the top of average monthly visitors. However, when average annual USD revenue is considered, Digital Services, Travel and Fintech lead the top regardless of the twitter metrics. It is as if the number of followers were not attached completely to the amount of revenue annually.

Regions’ social status

As there seems to be an increase on average annual USD revenue based on number of average number of Twitter followers, India highlights as an atipical value: has a significantly lower number of Twitter followers yet has more revenue, on average, than USA and Canada (USA/CA). It might be implying that Twitter might not be as relevant as it is in North America and Asia.


Content Clustering

Plot

For each company, we took the content of all tweets and then mapped them on a cluster dispersion. Every cluster is built based on words interaction and content.


Topic Modelling

This visualization represents the topics modelling after processing the companies content on their Twitter account. We chose 40 topics as this value get a relatively good coherence metric. For each company’s Twitter content we are going to call it a “document”. Every word will be considered as a “token”. This view is divided into two sections:

  1. On the left, there is a cartesian map showing the topics that bloomed from all documents. The buttons on the top left help navigate through every topic. The size of each topic represent their margin among all documents. For example, a margin of 10% would imply that this topic is capturing 10% of all content mapped. The x and y axis help represent the topics by calculing their principal components and it is a way to represent the most salient terms for each topic on a two dimensional scale; easy to visualize. As most of the topics are located on the right side of the plane, it is implied that most of the words and content is located on that area.
  2. On the right, it is shown what are the most salient terms on each topic and among all documents. By default, the most relevant words are shown for all content. If you hover the mouse over a topic on the left, the top 30 most salient words will be highligthed on the right. The control panel on the top right represents the relevance metric and it is associated with how relevant is a word (token) respect to a topic (lambda = 0) or respect to all documents (lambda = 1). If lambda is close to 0, words highlighted will be more associated with that specific topic, rather than to all documents. A lambda number recommended is 0.6.

For each topic revealed, the words most relevant to that topic are highlighted. Use the control on the top left to navigate between topics and the control on the right to adjust how relevant is each word on every topic. Relevance metric is to 1 so that all words are relevant to all topics.


Main topics analysis

  • Topic 1: Related to the daily (year, day, time) work and terms such as “help”, “great” and “learn”. Might be related to their intention to help their customers using great productos.
  • Topic 2: Related to the market performance on their industry. Main words: “market”, “busi”, “mobil”, “brand”, “digit”.
  • Topic 3: Related to E-Commerce, Artifitial Intelligence (AI), advertising (email), data and retail. Might be highlighting the impact of automated systems into the retail industry.

These first 3 topics have been related to digital businesses, E-Commerce, Antifitial Intelligence (AI) and their impact around all industries. They are talking from their perspective on how to use their services to solve problems.

  • Topic 4: The word “women” blooms on this specific topic. As this topic is close to the first two implies that companies are talking also about women in tech nowadays (woman, team, today, time).
  • Topic 5: This topic is far from the others and is more related to customer service a how they pursue solving problems. Main words: “pleas(e)”, “help”, “team”, “sorr(y)”, “support”, “contact”.
  • Topic 6: Previous topic was about solving problems. This topic is related to the feelings associated with those problems solved: “love”, “design”, “live”, “style”, “happ(y)”, “beaut(y)”.

In a way, as they are “shouting” how they want to solve their customers’ problems, this can serve as an input on how PayU should communicate with them and which terms to use in that contact.

Most of the other topics are close together so based on PayU’s strategy, one topic might work better than other.

  • Topic 20: relatively away from the others, topic 20 shows signs of human resources and their interest to hire new talent in the digital industry: “mobil(e)”, “app”, “network”, “hire”, “recruit”, “manag(ement)” and “hrtech_hr”.


Next Steps

Mapping the HVM’s ecosystem gives an input to define strategies inside PayU’s core business:

  1. Marketing and communication campaigns in the languajes, markets, events and industries that match better with target companies.
  2. Stands the parameters for events participation and profiling PayU’s speakers on each vertical and also suggest direct meeting agenda with Founders, VCs ands institutions in each event.
  3. Highlights the main conversation topics among HVM’s so that communication can be fluid and tracked in media to better Public Relations and Communications actions.
  4. The main conversation topics are insightfull for digital marketing content to be accurate to their interests and topics.
  5. Identifies Founders, Venture Capitals and Institutions that are top influencers in the target ecosystem so they are desiredable allies to work with.
  6. Commercial boost the lead database with commercial teams by venture stage, industry or geographic origin.